Identifier Namespaces in Mathematical Notation

نویسنده

  • Alexey Grigorev
چکیده

In Computer Science, namespaces help to structure source code and organize it into hierarchies. Initially, the concept of namespaces did not exist for programming languages, and programmers had to manage the source code themselves to ensure there were no name conflicts. However, nowadays, namespaces are adopted by the majority of modern programming languages. The concept of namespaces is beneficial for mathematics as well: In mathematics, short one-symbol identifiers are very common, and the meaning of these identifiers is hard to understand immediately. By introducing namespaces to mathematics, we will be able to organize mathematical identifiers. Also, mathematicians will also benefit from a hierarchical organization of knowledge in the same way programmers do. In addition, the structure will make it easier to understand the meaning of each identifier in a document. In this thesis, we look at the problem of assigning each identifier of a document to a namespace. At the moment, there does not exist a special dataset where all identifiers are grouped to namespaces, and therefore we need to create such a dataset ourselves. Namespaces are hard to prepare manually: building them requires a lot of time and effort. However, it can be done automatically, and we propose a method for automatic namespace discovery from a collection of documents. To do that, we need to find groups of documents that use identifiers in the same way. This can be done with cluster analysis methods. We argue that documents can be represented by the identifiers they contain, and this approach is similar to representing textual information in the Vector Space Model. Because of this, we can apply traditional document clustering techniques for namespace discovery. To evaluate the results, we use the category information, and look for pure “namespace-defining” clusters: clusters where all documents are from the same category. In the experiments, we look for algorithms that discover as many namespace-defining clusters as possible. Because the problem is new, there is no gold standard dataset, and it is hard to evaluate the performance of our method. To overcome it, we first use Java source code as a dataset for our experiments, since it contains the namespace information. We verify that our method can partially recover namespaces from source code using only information about identifiers. The algorithms are evaluated on the English Wikipedia, and the proposed method can extract namespaces on a variety of topics. After extraction, the namespaces are organized into a hierarchical structure by using existing

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces

The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces Status of This Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Abstract This document defines the "info" Uniform Resource Identifier (URI) scheme for information assets with identifiers in public namespac...

متن کامل

URN Syntax Filename: draft-ietf-urn-syntax-01.txt

Uniform Resource Names (URNs) are intended to serve as persistent resource identifiers. This document sets forward the canonical syntax for URNs. Support for both existing legacy and new namespaces is discussed. Requirements for URN presentation and transmission are presented. Finally, there is a discussion of URN equivalence and how to determine it.

متن کامل

URN Namespace Definition Mechanisms

The URN WG has defined a syntax for Uniform Resource Names (URNs) [RFC2141], as well as some proposed mechanisms for their resolution and use in Internet applications ([RFCXXXX], [RFCYYYY]). The whole rests on the concept of individual "namespaces" within the URN structure. Apart from proof-of-concept namespaces, the use of existing identifiers in URNs has been discussed ([RFC2288]), and this d...

متن کامل

Can TCP and Locator/ID Separation get along?

Aiming at solving the scalability problem that the current Internet is facing, the separation of the IP namespace into two different namespaces (the Locator and the Identifier namespaces) is one of the most promising paradigms. However, the impact of this new paradigm on the Internet traffic is yet to be assessed. In this extended abstract, we present a preliminary analysis of TCP performance i...

متن کامل

INTERNET DRAFT URN Syntax March

Uniform Resource Names (URNs) are intended to serve as persistent, location-independent, resource identifiers. This document sets forward the canonical syntax for URNs. A discussion of both existing legacy and new namespaces and requirements for URN presentation and transmission are presented. Finally, there is a discussion of URN equivalence and how to determine it.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1601.03354  شماره 

صفحات  -

تاریخ انتشار 2016